66 research outputs found

    On Special k-Spectra, k-Locality, and Collapsing Prefix Normal Words

    Get PDF
    The domain of Combinatorics on Words, first introduced by Axel Thue in 1906, covers by now many subdomains. In this work we are investigating scattered factors as a representation of non-complete information and two measurements for words, namely the locality of a word and prefix normality, which have applications in pattern matching. In the first part of the thesis we investigate scattered factors: A word u is a scattered factor of w if u can be obtained from w by deleting some of its letters. That is, there exist the (potentially empty) words u1, u2, . . . , un, and v0,v1,...,vn such that u = u1u2 ̈ ̈ ̈un and w = v0u1v1u2v2 ̈ ̈ ̈unvn. First, we consider the set of length-k scattered factors of a given word w, called the k-spectrum of w and denoted by ScatFactk(w). We prove a series of properties of the sets ScatFactk(w) for binary weakly-0-balanced and, respectively, weakly-c-balanced words w, i.e., words over a two- letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has c occurrences more than the other. In particular, we consider the question which cardinalities n = | ScatFactk (w)| are obtainable, for a positive integer k, when w is either a weakly-0- balanced binary word of length 2k, or a weakly-c-balanced binary word of length 2k ́ c. Second, we investigate k-spectra that contain all possible words of length k, i.e., k-spectra of so called k-universal words. We present an algorithm deciding whether the k-spectra for given k of two words are equal or not, running in optimal time. Moreover, we present several results regarding k-universal words and extend this notion to circular universality that helps in investigating how the universality of repetitions of a given word can be determined. We conclude the part about scattered factors with results on the reconstruction problem of words from scattered factors that asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word w P {a, b} ̊ can be reconstructed from the number of occurrences of at most min(|w|a, |w|b) + 1 scattered factors of the form aib, where |w|a is the number of occurrences of the letter a in w. Moreover, we generalise the result to alphabets of the form {1, . . . , q} by showing that at most ∑q ́1 |w|i (q ́ i + 1) scattered factors suffices to reconstruct w. Both results i=1 improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here. In the second part we consider patterns, i.e., words consisting of not only letters but also variables, and in particular their locality. A pattern is called k-local if on marking the pattern in a given order never more than k marked blocks occur. We start with the proof that determining the minimal k for a given pattern such that the pattern is k-local is NP- complete. Afterwards we present results on the behaviour of the locality of repetitions and palindromes. We end this part with the proof that the matching problem becomes also NP-hard if we do not consider a regular pattern - for which the matching problem is efficiently solvable - but repetitions of regular patterns. In the last part we investigate prefix normal words which are binary words in which each prefix has at least the same number of 1s as any factor of the same length. First introduced in 2011 by Fici and Lipták, the problem of determining the index (amount of equivalence classes for a given word length) of the prefix normal equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending the notion of critical words). We prove characterizations for both the palindromes and the collapsing words and show their connection. Based on this, we show that still open problems regarding prefix normal words can be split into certain subproblems

    α\alpha-β\beta-Factorization and the Binary Case of Simon's Congruence

    Full text link
    In 1991 H\'ebrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of kk-richness and later on Barker et al. the notion of kk-universality. In 2022 Fleischmann et al. presented a generalization of the arch factorization by intersecting the arch factorization of a word and its reverse. While the authors merely used this factorization for the investigation of shortest absent scattered factors, in this work we investigate this new α\alpha-β\beta-factorization as such. We characterize the famous Simon congruence of kk-universal words in terms of 11-universal words. Moreover, we apply these results to binary words. In this special case, we obtain a full characterization of the classes and calculate the index of the congruence. Lastly, we start investigating the ternary case, present a full list of possibilities for αβα\alpha\beta\alpha-factors, and characterize their congruence

    Local Patterns

    Get PDF
    A pattern is a word consisting of constants from an alphabet Sigma of terminal symbols and variables from a set X. Given a pattern alpha, the decision-problem whether a given word w may be obtained by substituting the variables in alpha for words over Sigma is called the matching problem. While this problem is, in general, NP-complete, several classes of patterns for which it can be efficiently solved are already known. We present two new classes of patterns, called k-local, and strongly-nested, and show that the respective matching problems, as well as membership can be solved efficiently for any fixed k

    The Edit Distance to k-Subsequence Universality

    Get PDF
    A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ?_k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ?_k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ?_k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ?_k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ?_k-equivalent to u. In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = ? is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over ?. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words

    Graph and String Parameters: Connections Between Pathwidth, Cutwidth and the Locality Number

    Get PDF
    We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard but fixed-parameter tractable (when the locality number or the alphabet size is treated as a parameter), and can be approximated with ratio O(sqrt{log{opt}} log n). As a by-product, we also relate cutwidth via the locality number to pathwidth, which is of independent interest, since it improves the best currently known approximation algorithm for cutwidth. In addition to these main results, we also consider the possibility of greedy-based approximation algorithms for the locality number
    corecore